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Introduction 

Latent variable models represent a useful tool in the social sciences where the analyzed 
constructs cannot be directly observed and, hence, they are not measurable. However, a set 
of indicators related to each unobserved variable can be measured. They are often coded 
into a number of ordered categories, so that latent variable models with ordinal variables 
have to be used. These models ca n be defined within the Generalized Linear Latent Var i- 
able Model (GLLVM) framework (|Bartholomew Knottl . Il999l : iMoustaki k Knottl . boOlh . 
according to which the entire set of the responses given by an individual to a certain number 
of items, called response pattern, is expressed as a function of one or more latent variables 
through a monotone differentiable link function. The estimation of the model parameters 
can be obtained by means of a full information maximum likelihood method via the EM 
algorithm, that guarantees quite accurate estimates 

The presence of the latent variables causes problems related to the integration of the likeli- 
hood function, since analytical solutions do not exist. In order to overcome this drawback, 
numerical approximations are usually appl ied. One of the most often used technique is the 
classical Gauss-Hermite (GH) quadrature (jBock fc Aitkinl . Il98ll ). that provides quite good 
parameter estimates when many quadrature points are considered per each latent variable. 
However, it becomes computational unfeasible as the number of latent variables increases. 
This represents a serious limitation for a large number of ap plications where several ob- 



served and latent variables are required (ICagnone et all . l200' 



api 



As alternative solution to GH, the Adaptive Gauss Hermite (AGH) qu adrature has been 
discussed for different models with random effects and /o r latent variables ( Pinhero &: Bate^ . 
1995l : lRabe-Hesketh et"all . l2005l : [Schilling Bockl . l2005l ). In all these studies, AGH is shown 
to perform better than GH, also when few quadrature points are used. Indeed, it consists 
of adjusting the GH nodes with the first and second moments of the posterior density of 
the latent factors given the manifest variables. This allows a better approximation of the 
function to be integrated. Nevertheless, the AG H is very computational in tensive, particu- 
larly in latent variable models for ordinal data ( Cagnone Sz Monari . 201 ll ). 
An approximation technique that is no t affected by the presence of high dirn ensional inte- 
grals is the Laplace method (jPe BruijnI . Il98ll : iBarndorff-Nielsen Sz Coxl. Il989l'l. that can b e 
viewed as a particular case of the AGH when just one abscissa is used (jLiu &: Piercd . 11994 ). 
Given its reduced dimensionality, the Laplace method is one of the fastest technique, since 
the computational burden depends only on the calculation of the mode of the integrand 
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dTiernev k Kadanj . Il986l : iRaudenbush et alJ . l200d : iHuber et all \2004 : IPinhero fc Clmo . 



20061 ) ■ However, the Laplace approximation has an error of order 0{p th at d e pend s 



only on the number of items p, hence it is not directly controllable. Moreover, Joe (j2008l 



has investigated its performance for a variety of discrete response mixed models, and he has 
found that it becomes less adequate as the degree of discreteness increases. 
When either the EM algorithm or a direct maximization of the observed data log-likelihood 
is used for model estimation, an extended version of the Laplace method, called Fully 
exponential Laplace Approximati on (FLA), can be applied. It has been introduced and 
developed by Tierney et al. ( 19891 ) in the Ba yesian context for appro ximating posterior dis- 



tributions. Recently, it has been extended by Rizopoulos et al. ( 20091 ) to a variety of models 



for longitudinal continuous measurements and time-to-event data estimated via the EM al- 
gorithm. The main idea proposed by these authors is to apply the FLA to the expected 
score function of the model parameters with respect to the posterior distribution of the 
latent variables. With the FLA, a better approximation of the multidimensional integrals 
is achieved, being the approximation error of order 0(p~^). Moreover, the computational 
complexity of this approach is similar to the classical Laplace method since it depends only 
on the numerical optimization required to compute the mode of the integrand. 
In this paper, we extend the FLA for the general class of latent variable models for ordinal 
data within the GLLVM context. In Section 2, the models for ordinal data are introduced, 
whereas in Section 3 the estimation problem is discussed, with particular attention to the 
fully exponential Laplace approximation. In Section 4, a simulation study is performed in 
order to compare the finite sample and asymptotic properties of the AGH and FLA under 
different conditions. Finally, Section 5 gives the conclusions. 

Model specification 

Let y be a vector of p ordinal observed variables each of them with Cj categories, and 
z be a vector of q latent variables. The q (i = 1, . . . ,p) ordered categories of the variables 
Hi have associated the probabilities 7rj^i(z), 7rj^2('2), • • • ,'7rj^Ci(z), which are functions of the 
vector of the latent variables z. 

Following the general scheme of the GLLVM framework, the probability associated to y is 
given by 

/(y) = / g{y\z)h{z)dz (i) 

where h{z) is assumed to be a multivariate standard normal distribution. g{y\z) is the con- 
ditional probability of the observed variables given z. It is assumed to follow a multinomial 
distribution 

p 

g{y\z)=llg{y,\z) (2) 



i=l 



where 



Expression ([2]) is obtained by assuming the conditional independence of the observed vari- 
ables given the latent variables. In expression ([3]), 7j,s(z) = 7rj^i(z) + 7ri^2(z) + . . . + 7rj^<j(z) 
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is the probability of a response in category s or lower on the variable i, and it is function of 
z. For simplicity, from now on we consider 7j.<j = 7j^s(z). y* ^ is equal to 1 if the response 
is in the category s or lower, and otherwise. 

As in the classical generalized linear model, the systematic component is defined as 



Vi,. 



n.. 



.7 = 1 



l,i = 1, 



,P 



(4) 



where rji^g is the linear predictor, and Tj^^ and Qjj can be interpreted as thresholds and factor 
loadings of the model. For the thresholds, the inequality — oo = Tj^o < < Ti,2 < • • • < 
Ti,Ci = +00 holds. Each factor loading aij measures the effect of the correspondent latent 
variable Zj on some function of the cumulative probability ^i^s- 

The relation between the systematic component and the conditional means of the random 
component distributions is given by r/j.<j = Vi^s{li,s), where Vi^s is the link function and can 
be any monotonic differentiable function. Here, we refer to the logit link function, so that 
eq. dl} is known as proportional odds model. However, other link functions can be chosen. 

Model estimation 



Model estimation is achieved by using the maximum likelihood through the EM algo- 
rithm, since the latent variables are unknown. At this regard, we apply a full information 
maximum likelihood method by which all the parameters of the model are estimated simul- 
taneously. 

For a random sample of size n, from equation ([T]), the observed data log- likelihood is defined 
as 



II It « 

L = y]log/(y;) = Vlog / g{yi \ zi)h{zi)dzi. 

1=1 1=1 Jr" 



(5) 



The EM algorithm consists of an Expectation step (E-step), in which the expected score 
function E{S{a.i)) of the model parameters a.[ = (rj^i, . . . r, 



i,Ci — lj Oiil ) • • • ) Oliq ) j ' 



1, 



,P, 



is computed. The expectation is with respect to the posterior distribution h{zi \ yi) of z 
given the observations for each individual. In the Maximization step (M-step), updated 
paraineter e stimates are obtained by equating to the expected score functions. 
LouisI (Il982l ) proved that maximizing the observed data score vector dL/dsLi is equivalent 
to maximize the expected score function E{S{a.i)) with respect to h{zi \ yi), so that 



dL_ 

da.i 



1=1 



! a{yi I zi)h{zi)dzi 



(6) 



^-^ / dlogg{yi I zi) 



1=1 



,p 



where 



dloggjyi I zi) 



C^-l 

E 

S = l 



yi,s,i yi,s+i,i 



';S,lizi)) 



dan 
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and 



Oi,s,lM = log 
b{9i^s,i{^i)) = log 



1i,s,l 



hi,s+l,l - 1i,s,l) 



l,i = 1, 



,P- 



1 , . . . , Cj 1 , i 



1, 



(7) 
(8) 



{11,8 + 1,1 - 7i,S,l) 

The expressions of the derivatives reported in the l a st equ ahty of ([7]) with respect to thresh- 
olds and loadings can be found in iMoustakil (j2000|, l2003l ) . 

From eq. ([6]) , it can be noticed that the computation of the expected score functions involves 
a multidimensional integral that cannot be solved analytically, hence numerical approxima- 
tions are required. In particular, in the following, we propose the use of an extended version 
of the classical Laplace approximation, that is the fully exponential Laplace method. 

Fully exponential Laplace approximation method 

The FLA method has been proposed for the first time bv lTiernev et al. (|l989l l in order 
to approximate posterior distributions in the Bayesian context. It represents an extension 
of the classical Laplace approximation that, as known, is based on the second order Taylor 
expansion of the logarit hm of the integrand , with the latent variables evaluated at the mode 
(see, among the others, Tierney Sz Kadane ( 19861 )). 

The Laplace method has the advantage of dealing with integrals of any dimensionality 
without introducing computational problems but, for the general class of latent variable 
models discussed in this paper, it produces an approximation error of order 0(p~^), that 
can be reduced only increasing the number of observed variables. The FLA leads to an 
improvement of the approximation error maintaining the same computational complexity 
as the classical Laplace method. The extension of FLA to joint mode ls for continuous 



longit udinal measurements and time-to-event data has been proposed by iRizopoulos et al 
(|2o3). [t requires the computation of the following quantities 



E{Aizi)) = / A{zi)h{zi I yi)dzi 



J A{zi)g{yi I zi)h{zi)dzi 



(9) 



/ g{Yi I zi)h{zi)dzi 

that differ from ([6]) since A[-) are the components of the score functions 5(aj) that depend 
on the latent variables. 

The main idea of FLA is to approxi mate both the numerator and the denominator in eq. ([9]) 
with the classical Laplace method. iTiernev &: Kadane (|l986l l proved that the error terms of 
order 0{p~^) in the numerator and the denominator cancel out, leading to a smaller error 
term of order 0(p~^). 

To extend the FLA to the proportional odds model discussed in this paper, we have 
to take into account for the derivatives ([7]) with respect to the thresholds and the loadings, 
that are characterized by different A{zi) components. In more detail, from the derivatives 
of the logarithm of g{yi^i \ z;) with respect to the thresholds we get 



d9. 



.3-1,1 



dn. 



li,s,l), 

11,3,1)-^ 



if s 
if s 



1 Ci 



^2,j,s(zz) 



db{ei,s,M)) 



(1 - li,3,l) 



li,s,l 



s+l,l 



li,3,l) 



1. 
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From the derivatives of the logarithm of g{yi^i \ zi) with respect to the loadings ctj 
{aii,...,aig), we get 

d\ogg{yi^s,i\2i) ( {l--fi^s,i)zi, if s = 1; 



dcti I (1 - 1i,s,i - 7j,s-i,/)z/, if s = 2, . . . , Cj. 

The FLA approximation can be applied only to strictly positive functions A{-). In our 
case, this condition is not necessarily guaranteed since the A(-) are components of the score 
functions, not constrained to be positive. To overcome this problem, the method of the 
moment generating function can be used. According to this approach, since the quantity 
exp{t'A(zi)} is always positive, the FLA approximation can be applied to the moment 
generating function M(t) = E[exp{t' A{zi))} , with latent variables z; evaluated at the mode 
zi = argmaxzjlog (7(yi | z;) + log/i(z;) +t'A{zi)]. In doing so, we get the approximate 
moment generating function M(t). Hence, from the corresponding cumulant-generating 
function logM(t), we obtain the approximate expected values E{A{zi)). These latter are 
the quantities of interest, and they are given by 

EiA{zi)) = -logM{t)\t=o = ^log^[exp{t'A(zO)}][^^. (10) 



Tiernev et al. (|l989l l proved (Theorem 2, pag. 712) that eq. (fTOj) is equivalent to the 



following expression 



Ot (z;=Z;,t=0) 

- Aizi)-hr{il) _ ,+0{p-') 

(zi=zi,t=0) 



2 

where 



_ 9'{log gjyi \zi) + log hjzi) + t'Ajzi)} _ 



= [-yls,ni,s+iA'^ - ii,s+i,i) + yls+i,ni,sA'^ - 7i,s,i)] } + i- ^Lfg^^^ 

i=l s=l ' ^ 

and 

The expressions of the first derivatives of Xl; with respect to t are reported in the Appendix. 
EM algorithm 

The steps of the EM algorithm are defined as follows: 



1. Choose initial values for the parameters a.'- = (tj^i, . . . Tj^c—i, Oii, • • • , "ig) i = 1, • • • ,p- 
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2. Compute the mode zi,l = 1, . . . , n, by using a Newton Raphson iteration scheme. In more 
detail, for the (m)-th iteration 

-^i ) ^i^l )JI(,,=^(™-i),t=o) 

where 5]/ is the Hessian matrix defined in expression (]12p and ^(z;) is defined as follows 

^ a{log g{yi I Xi) + log /i(z;) + t'Ajzi)} 

^W) = = (13) 

ozi 

= ^ {q^ [yls,ni,s+i,i - yls+i,ni,s,i] } + - fi^'^ • (i4) 

1=1 s=l '■ 

3. E-step. Compute the FLA expected values E{Ai^i^s{zi)), E{A3^i^s{2i)), for s = 1, . . . , Cj — 
l,i = l,...,p, and -B(^2,i,s-i(z;)) for s = 2, ...,Ci,i = l,...,p, and the approximate 
expected score function E{S{ai)), where i = 1, . . . ,p. 

4. M-step. Obtain improved estimates for the model parameters a- = 
(tj^i, . . . Ti^a-i, Oil) • • • ) ctiq), i = 1; ' " " )P- ^oi all of them, a Newton Raphson iter- 
ative scheme is used in order to solve the corresponding nonlinear maximum likelihood 
equations. 

5. Repeat steps 2-3-4 until convergence is attained. 

Simulation study 

The properties of the FLA method for the proportional odds model can be evaluated 
by performing a simulation study in which several conditions are taken into account. The 
results will be compared with those obtained using the AGH quadrature. In recent years, the 
latter has been widely applied in latent variable models, since it allows to obtain estimates 
that are as accurate as those derived by the GH technique, but using a small number of 
quadrature points. It essentially consists of scaling and translating the classical Gaussian 
quadrature locations to place them under the peak of the integrand, and two different 
procedures have been adopted in the literature. According to the first one, the mode of the 
integrand and t he inverse of the information matrix o f the integrand evaluated at the mode 
are computed (|Liu fc Piercel . Il994l : IPinhero fc Batej . Il995l : ISchilling fc Bockl . 120051 ). The 



advantage of this approach lies in the fact that the quadrature points are not involved in 
these computations. However, this method is computationally demanding since it requires 
numerical optimization routines and the computation of second derivatives. Moreover, when 
parameter estimates are obtained by using iterative algorithms, like in our case, the first 
and second order moments have to be computed at each step, hence the algorithm becomes 
very slow. 

An alternative procedure consi sts of computing the poster ior means and covariance matrices 



at each step of the algorithm (jRabe-Hesketh et al.l . l2005l ) . Although this method requires 



the use of quadrature points themselves, the posterior moments should better describe the 
integrand in those cases in which its tails are heavier than the normal density. In the 
following, we show how both these techniques work in latent variable models for ordinal 
data, and we compare their performances with FLA. 

The softwares used for the analyses are Fortran 95 and R. The codes are available from the 
authors upon request. 
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Table 1: Mean, bias and MSE of the parameter estimates for FLA, AGUme, and AGHmo in the 
generated data. 



% valid samples 87 

True Mean Bias MSB 



ail 
°'21 



"42 
"52 



= 1.03 
= 1.44 
= 2.11 
= 1.80 
= 1.53 
= 0.00 
= 2.42 
= 1.52 
= 0.75 
= 1.34 



1.52 
1.48 
2.19 
1.81 
1.53 

2.04 
1.62 
0.78 
1.50 



0.49 
0.04 
0.08 
0.01 
0.00 

-0.38 
0.10 
0.03 
0.16 



0.96 
0.53 
0.82 
0.63 
0.42 

0.60 
0.56 
0.44 
0.38 



84 

Mean Bias MSE 



1.52 
1.20 
1.88 
1.65 
1.36 

2.10 
1.82 
0.96 
1.63 



0.49 
-0.24 
-0.23 
-0.15 
-0.17 

-0.33 
0.30 
0.21 
0.29 



0.68 
0.38 
0.44 
0.44 
0.27 

0.41 
0.45 
0.36 
0.41 



Mean Bias MSE 

1.05 0.02 0.03 

1.37 -0.07 0.14 

2.08 -0.03 0.31 

1.44 -0.36 0.21 

1.54 0.01 0.14 

2.04 -0.38 0.35 

2.02 0.50 0.51 

1.11 0.36 0.18 

1.76 0.42 0.35 



Finite sample properties of the estimators 

To investigate empirically the finite sample performance of the FLA and AGH, based 
on both the posterior mean (AGHme) and mode (AGHmo), we generated data from a pop- 
ulation that consists of five variables and satisfies a two factor model. The number of 
categories is the same for each observed variable, and equal to 4. 100 random samples were 
considered with n = 200 subjects. We chose 5 quadrature points per each latent variable 
for both the adaptive approximations. We also considered 7 quadrature points, but there 
was a little difference with 5 nodes, suggesting that the latter provides sufficient accuracy 
for this example. 

The population parameters were chosen in such a way that the thresholds range from 
-3 to 3. The factor loadings are the following: ai = (1.03,1.44,2.11,1.8,1.53) and 
a2 = (0,2.42,1.52,0.75,1.34) with not null values generated from a log-normal distribu- 
tion, and one loading fixed to to get a unique solution. 

Table [U reports the mean, bias, and Mean Square Error (MSE) of the parameter estimates 
obtained by applying all the techniques. The results show that the percentage of valid sam- 
ples is quite high for all the procedures, ranging from 76% to 87%. The FLA presents much 
better MSE values than those achieved by AGH^e and AGH-mo, mainly due to a smaller 
variability of the estimates. Comparing the adaptive techniques, AGH^e estimates are less 
biased than those determined by AGUmo, and present an opposite sign of the bias for ai. 
On the other hand, the latter behaves better in terms of MSE values. 
The different performance of the two adaptive techniques can be due to the fact that the in- 
dividual posterior densiti es to be appro ximated are not always symmetric. In latent variable 
models for ordinal data, Chang ( 19961 ) proved that the posterior densities asymptotically 
follow a multivariate normal distribution. However, for a small number of observed vari- 
ables, the integrand could be skewed, and the numerical procedures could provide quite 
different results. 

To analyze the shape of the individual posterior densities in the generated population, we 
computed measures of multivariate skewness /3i^q and kurtosis /32,5 proposed by Mardia 
(1970). In the case of two latent variables, they are given by 
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and 

^2,2(0=^04 + ^40 + 2^22 I = 1, ...,n 

with fiij = E{z\iZ2i), whereas zu and Z21 are the latent factors standardized with respect to 
the posterior densities. Mardia (1970) also derived the asymptotic distributions of both f^i^g 
and /32,5, and the corresponding statistical tests to evaluate the null hypotheses Hq : /3i^g = 
and Hq : ^02,9 = q{q + 2), being q{q + 2) the kurtosis in g-variate normal densities. 
By computing these measures for the individual posterior densities generated in this simula- 
tion study, we observed that about 35% of these functions have a significant skewness, and 
a kurtosis always not significantly different from 8. In particular, /3i^2 is on average equal 
to 0.044 and it ranges from 0.000 to the significant value 0.168. The presence of individual 
posterior densities having different shapes could justify the different behavior of AGHs and 
FLA. In Figure [U we show two different functions obtained from our generated data. 

Figure 1. Individual posterior densities with different shapes (on the left side /3i^2=0.000, and on 
the right side 2=0. 168) in the generated population of 200 subjects. 




In order to better analyze the finite sample properties of FLA and AGHs, we also generated 
data from two hypothetical extreme scenarios: one in which all the posterior densities are 
symmetric, and another one in which a high percentage (more than 60%) of the densities 
are skewed. As before, we consider five observed variables, each with 4 categories, satisfying 
a two factor model. The results for both the populations are shown in Table [21 
In the first scenario, the thresholds for each item are equal to -2 for the first category, for 
the second, and 2 for the third one, whereas the loadings are all fixed to 0.5 except one set 
equal to zero. In this population, all the individual posterior densities are symmetric, with 
/3i^2 on average equal to 0.005, and /32,2 always not significantly different from 8. As in the 
previous simulation study, we generated 100 random samples with 200 subjects. 
For all the samples the algorithm achieves the convergence for FLA and AGHmej and in the 
96% of the cases for AGHmo- The FLA improves a lot with respect to the previous case, 
with a reduction of almost one digit in the MSE values, mainly due to smaller bias values 
for Q2- On the other hand, both the AGH techniques provide better results in terms of bias 
and MSE, even if they still perform worse than FLA. We can also notice that the results 
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Table 2: Mean, bias and MSE for FLA, AGH,„e, and AGH„io for different scenarios in finite samples. 



% valid samples 




100 






96 






100 




True 


Mean 


Bias 


MSE 


Mean 


Bias 


MSE 


Mean 


Bias 


MSE 


ail = 0.5 


0.78 


0.28 


0.59 


0.79 


0.29 


0.62 


0.59 


0.09 


0.02 


a21 — 0.5 


0.46 


-0.04 


0.39 


0.50 


-0.00 


0.46 


0.61 


0.11 


0.02 


«31 ~ 0.5 


0.43 


-0.07 


0.50 


0.41 


-0.09 


0.48 


0.60 


0.10 


0.02 


a4i — 0.5 


0.52 


0.02 


0.78 


0.54 


0.04 


0.77 


0.60 


0.10 


0.02 


"51 — 0.5 


0.61 


0.11 


0.67 


0.60 


0.10 


0.64 


0.61 


0.11 


0.02 


ai2 = 0.0 




















^22 — 0.5 


0.81 


0.31 


0.80 


0.83 


0.33 


0.82 


0.60 


0.10 


0.01 


0132 — 0.5 


0.64 


0.14 


0.79 


0.58 


0.08 


0.66 


0.59 


0.09 


0.01 


^42 — 0.5 


0.80 


0.30 


0.77 


0.77 


0.27 


0.85 


0.59 


0.09 


0.01 


^52 = 0.5 


0.72 


0.22 


0.61 


0.74 


0.24 


0.65 


0.59 


0.09 


0.01 










AGH 








FLA 








mean 






mode 










% valid samples 




83 






27 






35 




True 


Mean 


Bias 


MSE 


Mean 


Bias 


MSE 


Mean 


Bias 


MSE 


ail = 2.5 


2.49 


-0.01 


0.31 


2.35 


-0.16 


0.18 


1.82 


-0.68 


0.60 


c<2i = 2.5 


2.74 


0.24 


0.37 


2.56 


0.06 


0.15 


2.42 


-0.08 


0.19 


^31 = 2.5 


2.79 


0.29 


0.58 


2.47 


-0.03 


0.16 


2.36 


-0.14 


0.16 


(^41 = 2.5 


2.63 


0.13 


0.24 


2.57 


0.07 


0.12 


2.41 


-0.09 


0.08 


^51 — 2.5 


2.75 


0.25 


0.41 


2.61 


0.11 


0.14 


2.37 


-0.13 


0.15 


ai2 = 0.0 




















tk22 — 1.0 


1.04 


0.04 


0.44 


0.90 


-0.10 


0.35 


1.86 


0.86 


0.81 


0132 — 1.0 


1.26 


0.26 


0.52 


0.94 


-0.07 


0.34 


1.82 


0.82 


0.76 


^42 = 1.0 


1.07 


0.07 


0.50 


1.09 


0.09 


0.65 


1.90 


0.90 


0.87 


ot^2 ~ 1.0 


1.23 


0.23 


0.65 


0.94 


-0.06 


0.61 


1.87 


0.87 


0.80 



provided by the two adaptive procedures are almost the same, with an equal sign of the bias 
for all the estimates, and slight dis crepancies due to the differ ent computational techniques 
involved. Indeed, as discussed by Rabe-Hesketh et al. ( 20051 ). the two procedures should 



provide similar results when the posterior densities are symmetric. 

In the second scenario, the thresholds for each item are equal to -1 for the first category, 
for the second, and 1 for the third one, whereas the loadings are fixed equal to cxi = 
(2.5, 2.5, 2.5, 2.5, 2.5) and 0L2 = (0, 1, 1, 1, 1). In this case, the 65% of the posterior densities 
are skewed. /3i^2 ranges from 0.000 to 0.239, being the latter significantly different from 
zero, and it is on average equal to 0.127. On the other hand, there is not significant kurtosis 
for all the subjects. The main consequence of this high percentage of skew densities is that, 
for both FLA and AGH 

moi 3- very small number of samples (27% for the former, 35% for 
the latter) converge properly. Hence, even if the results are similar to the ones obtained 
in the first simulation, they are not reliable. On the other hand, AGH^e seems to be not 
affected by the different shapes of the posterior densities. It results more stable in terms of 
mean, bias, and MSE of the estimates as well as in terms of percentage of valid samples, 
that also in this case is 83%. 

From these results, we can argue that FLA will be superior than AGH when the majority of 
the posterior densities is symmetric. In these cases the former provides better MSE values 
for the estimates than the latter, mainly due to a reduced variability in the estimates. 
Moreover, the bias introduced in the estimates using FLA is quite comparable with the one 
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Table 3: Mean, bias and MSE for FLA, AGH,„e, and AGH„o for generated data with n = 1000. 



mean 



mode 



% valid samples 
True 



ail 
an 
"22 



= 1.03 
= 1.44 
= 2.11 
= 1.80 
= 1.53 
= 0.00 
= 2.42 
= 1.52 
= 0.75 
= 1.34 



Mean 

1.11 
1.43 
2.12 
1.85 
1.55 

2.17 
1.56 
0.74 
1.39 



.97 

Bias 

0.08 
-0.01 
0.01 
0.05 
0.02 

-0.25 
0.04 
-0.01 
0.05 



MSE 

0.11 
0.10 
0.09 
0.12 
0.06 

0.20 
0.11 
0.09 
0.07 



Mean 

1.13 
1.39 
2.09 
1.83 
1.52 

2.23 
1.61 
0.78 
1.42 



.92 

Bias MSE 



0.10 
-0.05 
-0.02 
0.03 
-0.01 

-0.19 
0.09 
0.03 
0.08 



0.09 
0.10 
0.09 
0.13 
0.06 

0.19 
0.12 
0.09 
0.07 



Mean Bias MSE 



1.04 
1.39 
2.15 
1.50 
1.59 

2.16 
2.04 
1.05 
1.73 



0.01 
-0.05 
0.04 
0.05 
0.05 

-0.26 
0.52 
0.30 
0.39 



0.01 
0.04 
0.07 
0.04 
0.04 

0.15 
0.32 
0.10 
0.20 



in the AGH estimates. On the other hand, we have also shown that the AGHme provides 
more stable results, that are not affected by the shape of the integrand. Its use is then 
suggested in populations characterized by a high percentage of skew distributions. 

Asymptotic properties of estimators 

The asymptotic pro perties of the Lapl a ce ma ximum likelihood estimators 6 have been 



derived and discussed by iRizopoulos et al 
these authors showed that 



Under suitable regularity conditions. 



On — Or, 



max n '^^'^iP ^ 



where denotes the true parameter value. 9 will be consistent as long as both n and p 
grow to CO. FLA is superior than standard Laplace method, the latter producing estimators 



with an appr o ximat ion e rror of order O p [max | n ^^'^,p ^}]. On the other hand, following 
Lhi fc Piercj (|l99i) and IXiernev et all ([l989), it can be shown that FLA shares the same 
approximation error of the AGH with 5 quadrature points. 

To assess the asymptotic accuracy of the FLA estimators, we generated 100 random samples 
with 1000 subjects from the population described in the previous section. We also applied 
both the adaptive techniques, and the results are shown in Table El 

The percentage of valid samples is high for all the techniques, ranging from 92% to 99%. 
FLA has a good performance as before with small MSE and bias values. On the other 
hand, both AGHs have an analogous behavior: the MSE values are drastically reduced 
with respect to the finite sample situation, and the bias is small for all the parameters. 
The three techniques present a very similar asymptotic behavior. Moreover, it is worth 
noting that FLA performs better than the classical Laplace approximation. Indeed, the 



asymptotic bias of the latter is higher than the one corresponding to AGH (jjod . l2008l ). 
whereas the bias in the AGH and FLA estimates is quite comparable, with a slight better 
performance of the former for the second factor loadings (Table [3]). 



Discussion 



This paper is concerned with the adequacy of several approximations of the likeli- 
hood function in latent variable models for ordinal data. In particular, we proposed an 
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extended version of the Laplace method for approximating integrals, known as fully ex- 
ponential Laplace approximation. Classic al Laplace methods are known to work poorly 
in presence of discrete response variables ( Joe . 20081 ) . but we have shown how the FLA 



is generally appropriate in models for ordinal data in both finite and large samples. The 
comparison with the adaptive Gauss Hermite quadrature techniques has highlighted that in 
finite samples the FLA provides better results in terms of MSE values when the majority of 
the posterior densities is symmetric. Indeed, for a small number of observed variables, the 
symmetry of the individual posterior densities is not always guaranteed, and the percentage 
of skew distributions tends to vary according to the parameter values. When the majority 
of the densities are skewed, FLA and AGHmo do not achieve converge in most cases. On 
the other hand, AGUme is more stable, and it is not affected by the shape of the functions 
to be approximated. 

The main strength of the FLA approach is that it effectively copes with high dimensional 
latent structures without increasing substantially the computational burden. This is one of 
the main drawbacks in the application of AGH techniques in latent variable models. Five 
quadrature points can provide accurate estimates, but the computational effort increases 
exponentially as the number of latent factors increases. Furthermore, in large samples, the 
FLA achieves the same approximation of the AGH with five quadrature points, and all the 
techniques behave similarly. 

The main limitation of the FLA approach is that it is not possible to control the magnitude 
of the approximation error of the integral, as done in AGH by mod ifying the number of 
quadrature points. However, as discussed by Rizopoulos et al. ( 20091 ). a virtue of the fully 
exponential Laplace approximation is that it is very general, and it can be used in almost 
all the general linear latent variable models. Overall, for latent variable models with ordinal 
data, the FLA is very adequate to approximate the likelihood function, and it should be 
considered as a valid alternative to adaptive Gaussian quadrature techniques. 
Further lines of research will be oriented to compare the performance of FLA with the mul- 
tidimesioi ial splines. The lat t er rep resents a useful alternative to approximate the posterior 
densities ( Thissen &: Woodi . 20061 ) and to investigate the main assumptions on the prior 
distribution of the lat e nt va riables that is still an open issue in the GLLVM framework 
(|Knott fc Tzamouranil . l2007l V 
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Appendix 

In order to apply the fully exponential Laplace approximation, the first derivative of J^i 
with respect to t has to be computed. At this regard, we make use of the following result 



it) 



dt 



(Zi=z;,t=0) 



dx dt 



{t=o) 



according to which 



(t) 



dt 



where A'{zi) 



{zi=^l,t=0) 



i=l s=l 

yi,s+l,ni,s,l{i - ^li,s,l + H,s,l)] X OCi^i^A'izi) - A"{zi), 



dzj 



Z;=Z; 



and A"izi) - 



dz'.dzi 



For the thresholds, the first-order partial derivatives result 

A,i,sM = Otai,s,l{l - li,s,l), S = 1, . . . , Cj - 1 

and 

^2,i,s(Zi) = -Oiili,s,l{^ - li,s,l) S = l,...,Ci-\. 

for Ai j ^(z;) and A2,j,s(z;), respectively. Furthermore, the corresponding second-order par- 
tial derivatives are given by 

^M,s(zO = -oc'iOcai^sA'^ - 2,-ii^s,l + 27j>,i), s = \,...,Ci-l 



and 



^2,i,s(zO = «iQ!i7i,s,«(l - 37i,s,i + 27j- sj) s = 1, • • ■ , Q - 1. 



As for the loadings, the elements of the gradient A^^i^si^i) with respect to the latent 
variables result 



dzki 



dA3,i,s(zji) _ / (1 -7i,i,i)(l + 7j,i,«"u^iO> ifs = l; 

(1 - li,s,i - li,s-i,i) + aijZji{-fi^s,li'i- - li,s,l) + 7i,s-i,Kl - li,s-,l)), if s = 2, . . . , ci. 
OiikZjai,i,i{l - if s = 1; 

aikZji{'li,s,i{^ - 7m,«) + - li,s~,l)), if s = 2, . . . ,d 



On the other hand, the elements of the corresponding Hessian matrix are given by 



dp 



aii7i,i,«(l - 7j,i,;)(2 - - 27i,ij)); if s = 1; 

aii[7M,Kl - 7i,s,i)(2 - aijZji{l - 2ji^s^i))+ if s = 2, 

. +7m-i,K1 - 7m-m)(2 - "y^jKl - '^7i,s-i,l))], 



, ci. 
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( OiikJi,u{l-7i,i,i){l-aikZji{l-2ji^i^l)), if.s = l; 

dz-i'dzki = \ «ifc[7M,z(l -7i,s,/)(l - - 27i,s,0)+ ifs = 2,...,d. 

d-'As^i (zji) ^ f -a|2j77i,i,Kl - 37i,i,z + 27»^,i,z), if s = 1; 

I -«^^(7M-M-37t-ii + 27i!.-i, + 7i,.,z-37^,, + 273 ), ifs = 2,...,ci 



